Shanghai AI Lab

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

00:02

2026-06-29

ianbarber.blog

large-language-models

It’s always the learning rates

Scaling laws predict training loss as model size, dataset size, and compute scale, but their practical application is sensitive to hyperparameter choices like learning rate. Lilian Weng's post highlig…

// co-occurs with top 7 entities

Lilian Weng 1 Yang 1 Hu 1 Zhou 1 Xing 1 BERT 1 GPT-3 1